-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stats to track number of open files and max open file limit of validator process #28945
Conversation
00f8c29
to
e87ce35
Compare
It seems that procfs.fd_count didn't tracks files open by mmap.
To get the number of mmaped file, use the following command.
|
core/src/system_monitor_service.rs
Outdated
let curr_num_open_fd = proc.fd_count().unwrap(); | ||
let curr_mmap_count = proc.maps().unwrap().len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you measured how long this takes to execute with 2M open fds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. It takes about 10.520629985s for 2M open fds. Maybe we can increase the report interval from 30s to 3 minutes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may very well miss an event on a three minute interval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all seems pretty blunt. Beyond append vecs, which seems like it would be covered by #28958, what other sources of unbounded fd/mmap growth exist and perhaps we can directly instrument those as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dumb wc -l /proc/PID/maps
is like 600ms (850ms if the file is copied aside first) on a box with 505k maps. should be linear to 2M, ~2.5s. i'm guessing it's all the parsing that's of no use to us which is accounting for the rest of that time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$ time cat maps | sed -e 's:.*/\(accounts\|accounts_index\)/.*:\1:;t;c\
other' | sort | uniq -c
416730 accounts
82965 accounts_index
4980 other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sol@kin-validator-am6-1:~$ cat /proc/30565/maps| sed -e 's:.*/\(accounts\|accounts_index\)/.*:\1:;t;c\other' | sort | uniq -c
87885 accounts
191468 accounts_index
3399 other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I tried @t-nelson's command. Looks like #28958 tracks the append-vecs correctly. the other large chunk of mmaps are used for accounts_index, 191K. @jeffwashington probably, we should track the number of accounts_index mmap file too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dumb
wc -l /proc/PID/maps
is like 600ms (850ms if the file is copied aside first) on a box with 505k maps. should be linear to 2M, ~2.5s. i'm guessing it's all the parsing that's of no use to us which is accounting for the rest of that time?
yeah. it takes 1.51s to count 2M open mmaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I tried @t-nelson's command. Looks like #28958 tracks the append-vecs correctly. the other large chunk of mmaps are used for accounts_index, 191K. @jeffwashington probably, we should track the number of accounts_index mmap file too?
Here is the change to track accounts index mmap files.
#28984
bucket_map/Cargo.toml
Outdated
@@ -14,6 +14,7 @@ edition = "2021" | |||
log = { version = "0.4.17" } | |||
memmap2 = "0.5.3" | |||
modular-bitfield = "0.11.2" | |||
procfs = "0.14.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this crate handle the non-atomicity of multi-page reads in procfs?
bucket_map/src/bucket_storage.rs
Outdated
fn get_mmap_count() -> Option<usize> { | ||
let pid = std::process::id(); | ||
let map_path = format!("/proc/{}/maps", pid); | ||
let output = std::process::Command::new("wc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can avoid shelling out with something like https://doc.rust-lang.org/rust-by-example/std_misc/file/read_lines.html. just toss a .count()
on after .lines()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to copy the maps file to a temporary first, to avoid being bitten by non-atomic reads in procfs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like readlines is slower than wc -l
.
Time elapsed is: 3.10014196s
I guess /proc/pid/maps are in memory kernel data structure, which is faster to read than the file on disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess /proc/pid/maps are in memory kernel data structure, which is faster to read than the file on disk.
yeah it is. also means it can be updated from beneath you. which is where the non-atomicity issues come in that i keep crowing on about 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But since this is used to report a metric, a small difference, caused by not being atomic, would be fine, I think.
bucket_map/src/bucket_map.rs
Outdated
std::fs::copy(map_path, copy_path.as_os_str()).unwrap(); | ||
|
||
let file = std::fs::File::open(copy_path).ok()?; | ||
Some(std::io::BufReader::new(file).lines().count()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how's timing in comparison?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wc-l 1.5s
wc-l-copy 1.7s
readline 3.1s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like wc-l without copy is the fastest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's unexpected 🤔
did you try .lines()
w/o copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. .lines() w/o copy takes 2.87s
seems that wc
is better optimized than rust lines() fn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird. i wonder if it's the utf-8 handling on the rust side. wc
is probably dumb and assumes ascii 🤔
Co-authored-by: Michael Vines <[email protected]>
make open_fd stats only on linux platform
I assume there are no more questions. Merge it now. Feel free to ping me if you have any more comments. Thanks! |
sorry, i forgot to update here. i'm really not keen on shelling out. we're going to find that one stupid distro that doesn't have |
err, yuck. 👎🏼 to adding that system
|
The other big source of mmap is accounts index mmap files, which should be addressed in this pr |
ok. I will revert it. |
only map consumers besides accountsdb and accounts-index are anon mappings (which I don't think we make explicitly), the executable file and shared-objects |
Another big consumer of open fd is from rocksdb. Maybe there is a way to get the open fd from rocksdb? |
are those the anon maps? i didn't see anything with a file path under the ledger dir |
ah ha! so we can count anon maps out of procfs, but afaik there's no way to link them to their caller |
yes. count procfs lines will include them. But I doubt there is an API on rocksdb to report it. |
random sampling
|
Problem
Recently, in kin-sim, validator crashes because of reaching the max number of open file limits.
It is useful to track the actual number of open files and max open files limits for the validator process to help debugging and trouble shooting.
Summary of Changes
Fixes #